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^^ Abstract 

<N 

We consider the problem of recovering items matching a partially specified pattern in multidi- 
mensional trees (quad trees and k-d trees). We assume the traditional model where the data consist 
of independent and uniform points in the unit square. For this model, in a structure on n points, it is 
known that the number of nodes C n (£) to visit in order to report the items matching an independent 
and uniformly on [0, 1] random query £ satisfies E[C n (£)] ~ nrfi \ where k and /3 are explicit con- 
!_, stants. We develop an approach based on the analysis of the cost C n (x) of any fixed query x G [0,1], 

and give precise estimates for the variance and limit distribution of the cost C n (x). Our results per- 
mit to describe a limit process for the costs C n (x) as x varies in [0, 1]; one of the consequences is 
that E[max xG [ i] C n {x)\ ~ jn 13 ; this settles a question of Devroye [Pers. Comm., 2000]. 
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1 Introduction 



Multidimensional databases arise in a number of contexts such as computer graphics, management of 
geographical data or statistical analysis. The question of retrieving the data matching a specified pattern 
is then of course of prime importance. If the pattern specifies all the data fields, the query can generally 
be answered in logarithmic time, and a great deal of precise analyses are available in this case [11, 13, 
15, 18, 19]. We will be interested in the case when the pattern only constrains some of the data fields; 
^ we then talk of a partial match query. 

The first investigations about partial match queries by Rivest [28] were based on digital structures. 
In a comparison-based setting, a few general purpose data structures generalizing binary search trees 
permit to answer partial match queries, namely the quadtree [10], the k-d tree [1] and the relaxed k-d 
tree [7]. Aside of the interest that one might have in partial match for itself, there are numerous reasons 
that justify the precise quantification of the cost of such general search queries in comparison-based data 
structures. The high dimesional trees are indeed a data structure of choice for applications that range 
from collision detection in motion planning to mesh generation that takes advantage of the adaptive 
partition of space that is produced [17, 35]. For general references on multidimensional data structures 
and more details about their various applications, see the series of monographs by Samet [32, 33, 34]. 
The cost of partial match queries also appears in (hence influences) the complexity of a number of other 
geometrical search questions such as range search [6] or rank selection [8]. 

In spite of its importance, the complexity results about partial match queries are not as precise as 
one could expect. In this paper, we provide novel analyses of the costs of partial match queries in some 
of the most important two dimensional data structures. Most of the document will focus on the special 
case of quadtrees ; in a final section, we discuss the case of k-d tree [1] and relaxed k-d trees [7]. 

Quad trees and multidimensional search. The quadtree [10] allows to manage multidimen- 
sional data by extending the divide-and-conquer approach of the binary search tree. Consider the point 
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Figure 1: An example of a (point) quadtree: on the left the partition of the unit square induced by the tree data 
structure on the right (the children are ordered according to the numbering of the regions on the left). Answering 
the partial match query materialized by the dashed line on the left requires to visit the points/nodes coloured in 
red. Note that each one of the visited nodes correspond to a horizontal line that is crossed by the query. 



sequence pi,P2, • • • ,Pn E [0, l] 2 . As we build the tree, regions of the unit square are associated to the 
nodes where the points are stored. Initially, the root is associated with the region [0, l] 2 and the data 
structure is empty. The first point p\ is stored at the root, and divides the unit square into four regions 
Qi, . . . , Q4. Each region is assigned to a child of the root. More generally, when i points have already 
been inserted, we have a set of 1 + 3 Z (lower-level) regions that cover the unit square. The point pi+\ is 
stored in the node (say u) that corresponds to the region it falls in, divides it into four new regions that 
are assigned to the children of u. See Figure 1. 

Analysis of partial match retrieval. For the analysis, we will focus on the model of random 
quadtrees, where the data points are uniformly distributed in the unit square. In the present case, the 
data are just points, and the problem of partial match retrieval consists in reporting all the data with one 
of the coordinates (say the first) being s E [0, 1]. It is a simple observation that the number of nodes of 
the tree visited when performing the search is precisely C n (s), the number of regions in the quadtree 
that insersect a vertical line at s. The first analysis of partial match in quadtrees is due to Flajolet et al. 
[14] (after the pioneering work of Flajolet and Puech [12] in the case of k-d trees). They studied the 
singularities of a differential system for the generating functions of partial match cost to prove that, for 
a random query £, being independent of the tree and uniformly distributed on [0, 1], 



E [Cn(0] ~ Kn? where 



T(2/3 + 2) 

2r(/3 + i) 3 ' 



p 



\/l7-3 



(1) 



and T(x) denotes the Gamma function T(x) = J °° t x ~ 1 e~ t dt. This has since been strengthened by 
Chern and Hwang [3], who provided the order of the error term (together with the values of the leading 
constant in all dimensions). The most precise result is (6.2) there, saying that 



E[C n (0] = nn p -l + 0{n 



iP-ly 



(2) 



To gain a refined understanding of the cost beyond the level of expectations we pursue two directions. 
First, to justify that the expected value is a reasonable estimate of the cost, one would like a guarantee that 
the cost of partial match retrieval are actually close to their mean. However, deriving higher moments 
turns out to be more subtle than it seems. In particular, when the query line is random (like in the uniform 
case) although the four subtrees at the root really are independent given their sizes, the contributions of 
the two subtrees that do hit the query line are dependent \ The relative location of the query line inside 
these two subtrees, is again uniform, but unfortunately it is same in both regions. This issue has not yet 
been addressed appropriately, and there is currently no result on the variance of or higher moments for 

Cn(0- 

The second issue lies in the very definition of the cost measure: even if the data follow some distri- 
bution (here uniform), should one really assume that the query also satisfies this distribution? In other 



words, should we focus on C n (£)? Maybe not. But then, what distribution should one use for the query 
line? 

One possible approach to overcome both problems is to consider the query line to be fixed and to 
study C n (s) for s E [0, 1]. This raises another problem: even if s is fixed at the top level, as the search 
is performed, the relative location of the the queries in the recursive calls varies from a node to another! 
Thus, in following this approach, one is led to consider the entire process C n (s), s E [0, 1] ; this is the 
method we use here. 

Recently Curien and Joseph [4] obtained some results in this direction. They proved that for every 
fixed s E (0,1), 



L WJ v v ;; 2r(/3 + i) 3 r(/3/2 + i) 



(3) 



On the other hand, Flajolet et al. [14, 15] prove that, along the edge one has E[C n (0)] = 6(n v ^ _1 ) = 
o(n /5 ) (see also [4]). The behaviour about the x-coordinate U of the first data point certainly resembles 
that along the edge, so that one has E[C n (?7)] = o(n /3 ). It suggests that C n (s) should not be concen- 
trated around its mean, and that n~^C n (s) should converge to a non-trivial random variable as n —¥ oc. 
This random variable would of course carry much information about the asymptotic properties of the 
cost of partial match queries in quadtrees. Below, we identify these limit random variables and obtain 
refined asymptotic information on the complexity of partial match queries in quadtrees from them. 

2 Main results and implications 

Our main contribution is to prove the following convergence result: 

Theorem 1. Let C n (s) be the cost of a partial match query at a fixed line s in a random quadtree. Then, 
there exists a random continuous function Z such that, as n —¥ oc, 

Cn{S \s(E\0,l])A(Z(s),s(E\0,l\). (4) 



Km?' 

This convergence in distribution holds in the Banach space (V[0, 1], || • ||) of right-continuous functions 
with left limits (cddldg) equipped with the supremum norm defined by ||/|| = sup s€ r 01 i \f(s) |. 

Note that the convergence in (4) above is stronger than the convergence in distribution of the finite 
dimensional marginals 

Cn(si) C n (s 2 ) C n (s k ) \ d .„, v , v , vv 

"7^"' -Krf> • • • ' ~k^P) ^ {Z(si) ' Z(S2) ' • • • ' Z{sk)) 

as n —¥ oc, for any natural number k and points si, 52, . . . , Sk E [0, 1] [see, e.g., 2]. Theorem 1 has a 
myriad of consequences in terms of estimates of the costs of partial match queries in random quadtrees. 
Of course, Theorem 1 would be of less practical interest if we could not characterize the distribution of 
the random function Z (see Figure 2 for a simulation): 

Proposition 2. The distribution of the random function Z in (4) is a fixed point of the following recursive 
functional equation 



z(s) ±i {s<u} [(inrfzn (£) + (£/(i - v)fz^ V[J 

U\ ,,. „ w _ ,,p.^u ( s-U 



+ 1 



{s>U} 



((1 - U)VfZ^ (i-|) + ((1 - C/)(l - V)fZ^ ( 



1-U 



(5) 



where U and V are independent [0, \\-uniform random variables and Z^\ i = 1, . . . , 4 are independent 
copies of the process Z, which are also independent of U and V. Furthermore, Z in (4) is the only 
solution of (5) such that E[Z(s)] = (s(l - s)Y /2 for alls E [0,1] a^E[||Z|| 2 ] < oc. 
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Figure 2: A random quadtree on 1000 points and the corresponding partial match process on the right ; 
in red we have shown the expected value. 



This is indeed relevant since the convergence that implies Theorem 1 is strong enough to guarantee 
convergence of the variance of the costs of partial match queries. The following theorem for uniform 
queries £ is the direct extension of the pioneering work of Flajolet and Puech [12], Flajolet et al. [14] for 
the cost of partial match queries at a uniform line in random multidimensional trees. 

Theorem 3. If £ is uniformly distributed on [0, 1], independent of(C n ) and Z, then 

in distribution with convergence of the first two moments. In particular 

Var (C n (0) ~ K A n 2p where K A := K\ • Var(Z(f)) ~ 0.447363034. 

In particular, Theorem 3 identifies the asymptotic order of Var(C n (£)) which is to be compared 
with studies that neglected the dependence between the contributions of the subtrees mentioned above 
[20, 21, 23]. We also have an asymptotic for the variance of the cost at a fixed query: 

Theorem 4. We have for all s E (0, 1), as n — > oo, 

Var (C n {s)) ~ (2BG8 + l,/3 + l)^^y - l) (*(1 - *)?«*. 

Here, B(a, b) := J x a_1 (l — x) 6_1 dx denotes the Eulerian beta integral fa, 6 > 0). 



(6) 



Some of the most striking consequence concerns the cost of the worst query in a random plane 
quadtree. Note in particular that the supremum does not induce any extra logarithmic terms in the 
asymptotic cost. 

Theorem 5. Let S n = sup sG r 05 i] C n (s). Then, as n — » 00, 

- p S n -4 S = sup Z(s) and E[S n ] ~ n^E[5], Var(S n ) - n 2/3 Var(S). 



n 



«€[0,1] 



Finally we note that the one-dimension marginals of the limit process (Z(s), s E [0, 1]) are all the 
same up to a multiplicative constant. 



Theorem 6. There is a random variable Z > such that for all s E [0, 1], 

Z(s) ± (s(l - s)f' 2 Z. (7) 

The distribution of Z is characterized by its moments c m := E [Z m ], mGN. 77zey are g/ven by c\ — \ 
and the recurrence 

c ™ = 7 ^ m + 1) 3 ^ (™W^ + l,/?(m - £) + l)c,c m _,, m > 2. 

Plan of the paper. Our approach requires to work with random functions; as one might expect, 
proving convergence in a space of functions involves a fair amount of unavoidable technicalities. Here, 
we try to keep the discussion at a rather high level, to avoid diluting the main ideas in an ocean of intricate 
details. In Section 3, we give an overview of our main tool, the contraction method. In Section 4, we 
identify the variance and the supremum of the limit process Z, and deduce the large n asymptotics for 
C n (s) in Theorems 3 and 5. 

3 Contraction method: from the real line to functional spaces 

3.1 Overview 

The aim of this section is give an overview of the method we employ to prove Theorem 1. The idea is 
very natural and relies on a contraction argument in a certain space of probability distributions. In the 
context of the analysis of performance of algorithms, the method was first employed by Rosier [29] who 
proved convergence in distribution for the rescaled total cost of the randomized version of quicksort. 
The method was then further developed by Rachev and Riischendorf [27], Rosier [30], and later on in 
[5, 9, 22, 24, 25, 31] and has permitted numerous analyses in distribution for random discrete structures. 
So far, the method has mostly been used to analyze random variables taking real values, though a few 
applications on functions spaces have been made, see [5, 9, 16]. Here we are interested in the function 
space P[0, 1] with the uniform topology, but the main idea persists: (1) devise a recursive equation 
for the quantity of interest (here the process (C n (s), s G [0, 1])), and (2) prove that a properly rescaled 
version of the quantity converges to a fixed point of a certain map related to the recursive equation ; (3) 
if the map is a contraction in a certain metric space, then a fixed point is unique and may be obtained by 
iteration. We now move on to the first step of this program. 

(n) (n) 

Write II J , . . . ,I\ J for the number of points falling in the four regions created by the point stored 
at the root. Then, given the coordinates of the first data point (U,V), we have, cf. Figure 1, 

(/{"^...^i^^Muitcn-i; i/v, 17(1 -n(i-^)(i-na-^)n 

Observe that, for the cost inside a subregion, what matters is the location of the query line relative to the 
region. Thus a decomposition at the root yields the following recursive relation, for any n > 1, 



C n (s) = 1 + 1{ S<U } 



ci ^iii) + c % iv) J + 1{s - u} c ??\T^v) + c % ] 



l-U 



, (8) 



T (n) T (n) , ^ 110Mf ; f ;^ ol^o^K, ;^+™r1n^r1 ™A (n^ l )\ (n^)\ 



where U,l\ V . . ,I\ are the quantities already introduced and (C^ ; ),..., (C^ ) are independent 

copies of the sequence (C&, k > 0), independent of (J7, V, I± , . . . , I4 ). We stress that this equation 
does not only hold true pointwise for fixed s but also as cadlag functions on the unit interval. The relation 
in (8) is the fundamental equation for us. 

Letting n — >► 00 (formally) in (8) suggests that, if n~^C n (s) does converge to a random variable 
Z(s) in a sense to be precised, then the distribution of the process (Z(s), < s < 1) should satisfy the 



following fixed point equation 



Z(s) 



zl {s<U} 
+ 1 {s>U} 



{uvfzM (£) + (U(l - v)fz^) ( 



((1 - u)vfz^ ( 



s-U 

\\-u 



((1-U)(l-V)fz^ 



s-U 



(9) 



l-U y 
, 4 are independent 



where U and V are independent [0, 1] -uniform random variables and Z^\ i = 1, . 
copies of the process Z, which are also independent of U and V. 

The last step leading to the fixed point equation (9) needs now to be made rigorous. This is at 
this point that the contraction method enters the game. The distribution of a solution to our fixed-point 
equation (9) lies in the set of probability measures on the Banach space (P[0, 1], || • ||), which is the 
set we have to endow with a metric. The recursive equation (8) is an example for the following, more 
general setting of random additive recurrences: Let (X n ) be P[0, 1] -valued random variables with 



X n ^A^\x%) + b^\ 



n> 1, 



(10) 



r=l 



where (A^ 1 , . . . , A%) are random linear and continuous operators on P[0, 1], b^ is a D[0, l]-valued 
random variable, l[ n , . . . , 1^ are random integers between and n — 1 and (Xn ),•••, (X n ) are 
distributed like (X n ). Moreover (A^\ . . . , A^\b^ n \l[ n \ . . . , J^ } ), (X^), . . . , (X^ K) ) are indepen- 
dent. 

To establish Theorem 1 as a special case of this setting we use Proposition 7 below. Proposition 7 
is covered by the forthcoming paper [26]. We first state conditions needed to deal with the general 
recurrence (10); we will then justify that it can indeed be used in the case of cost of partial match 
queries. Consider the following assumptions, where, for a random linear operator A we write \\A\\2 := 
E [||^llo P ] 1/2 with lollop := sup|| x || =1 ||A(x)||. Suppose (X n ) obeys (10) and 

(Al) Convergence and contraction. We have 1 1 ^4^ 1 1 2 5 ll&nlb < °o for all r = 1, . . . ,K and 
n > and there exist random operators Ai, . . . , Ak on P[0, 1] and a D[0, l]-valued random 
variable b with, for some positive sequence R(ri) I 0, as n —¥ 00, 



K 



||b(n) 

and for all IeN, 
and 



6H2 + E 



r=l 



E 



n) 



Ar||2 + 



{4 n) <no} A) - 



= <Wn)) 



(11) 



L 1 {4" ) e{o,...,nu{n}}H v4 '- n I 



-►0 



L* — lim sup E 



• K 

E 



n)i|2 



i?(/, 



W> 



lop 



i2(n) 



< 1. 



(12) 



(A2) Existence and equality of moments. E[||X n || 2 ] < ooforallnandE[X ni (£)] = E[X n2 (t)} 
forallni,n 2 G N ,£ e [0,1]. 



(A3) Existence of a continuous solution. There exists a solution X of the fixed-point equation 

(13) 



K 

X = ^A r (X^) + b 



r=l 



with continuous paths, E[||X|| 2 ] < oc and E[X(£)] = E[Xi(t)] for all t e [0,1]. Again 
(Ai, . . . , A Kl 6), X«, . . . , XM are independent and X«, . . . , X^ are distributed like X. 



(A4) Perturbation condition. X n = W n + h n where ||/i n — h\\->0 with /i E 2?[0, 1] and random 
variables W n in P[0, 1] such that there exists a sequence (r n ) with, as n —¥ oo, 

P(^n^r-„[0,1])^0. 

Here, £V n [0, 1] C P[0, 1] denotes the set of functions on the unit interval, for which there is a 
decomposition of [0, 1] into intervals of length as least r n on which they are constant. 

(A5) Rate of convergence. R(n) = o (log" m (l/r n )). 

The crucial part that makes everything work consists in choosing a probability metric in such a way 
that the limiting map is indeed a contraction. The contraction method presented here for the Banach 
space (2?[0, 1], || • ||) is based on the Zolotarev metric ( s and, for our fixed-point equation, we indeed 
obtain contraction with s = 2. This follows by our modified assumption Al since 



E 



K 



EH A r 



r=l 



limE 

n 



" K 

E 



n)i|2 



< lim sup E 



K 



£ll4 



n)i|2 



R(I. 



(nh 



T=l 



R{n) 



< 1. 



The amounts of details to be verified prevents us to provide a complete proof of all the assumptions in 
the present case. In the remainder of the section, we will not come back on the method and Proposi- 
tion 7 itself but show how it can be applied; we will however, discuss and outline the proof of the main 
assumptions (Al), (A2), (A3) and (A5). 

Proposition 7. Let X n fulfill (10). Provided that Assumptions (A1)-(A3) are satisfied, the solution X of 
the fixed-point equation (13) is unique. 

i. For all t E [0, 1], X n (t) —¥ X(t) in distribution, with convergence of the first two moments; 

ii. IfU is independent of (X n ), X and distributed on [0, 1] then X n (U) —> X(U) in distribution again 

with convergence of the first two moments. 
Hi. If also (A4) and (A5) hold, then X n — >> X in distribution in (T>[0, 1], || • ||). 



3.2 Existence of a continuous solution 

In this section, we outline the proof of existence of a continuous process Z that satisfies the distribu- 
tional fixed point equation (9) as it is needed for assumption (A3). We construct the process Z as the 
pointwise limit of martingales. We then show that the convergence is actually almost surely uniform, 
which allows us to conclude that Z is actually continuous with probability one. Write C[0, 1] for the 
space of continuous functions on [0,1]. 

Consider the infinite 4-ary tree T = (J n>0 {l, 2,3, 4} n . For a node u E T, we write \u\ for its depth, 
i.e. the distance between u and the root 0. The descendants of u E T correspond to all the words in 
T with prefix u. Let {U v , v E T} and {V v , v E T} be two independent families of i.i.d. [0, l]-uniform 
random variables. 



Construction by iteration. Define the operator G : (0, l) 2 x C[0, l] 4 -> C[0, 1] by 



G{x, y, /i, / 2 , / 3 , h)(s) =1 {S<X} [{xyfh (^ + (x(l - y)fh ( 



(14) 



+ ^{s>x} 



UX-xWh 



1 



X 



+ ({l- X ){l-y)f h 



■ X 



Let h be the map defined by h(s) = (s(l — s))^/ 2 , where 2/3 = 
Zq = h. Then define recursively 

ryU f~1(TT T/ iyU\ rvul ryu3 



17 — 3. For every node u E T, let 



7 



w4\ 



(15) 



Finally, define Z n = Z® to be the value observed at the root of T when the iteration has been started 
with h in all the nodes at level n. 



A SERIES REPRESENTATION FOR Z n . For s G [0, 1], Z n (s) is the sum of exactly 2 n terms, each one 
being the contribution of one of the boxes at level n that is cut by the line at s. Let {(3™(s), 1 < i < 2 n } 
be the set of rectangles at level n whose first coordinate intersect s. Suppose that the projection of Q ™(s) 
on the first coordinate yields the interval [i™, rf]. Then 



2 n 



z n (s) = Y t ^HQi(s)Y 



2=1 



£? 



I? 



(16) 



where Leb(Q™(s)) denotes the volume of the rectangle Qf(s). The difference between Z n and Z n +\ 
only relies in the functions appearing the boxes Qf(s): We have 



Z n +l(s) - Z n (s) 



2 n 

E 

2=1 



Leb(Q?( S ))^ 



G{U[,V!,h,h,h,h) 



-h 



(17) 



where E7" 4 ', V/, 1 < i < 2 n are i.i.d. [0, 1] -uniform random variables. In fact, \J[ and V( are some of the 
variables U u , V u for nodes u at level n. Observe that, although Qf(s) is not a product of n independent 
terms of the form UV because of size-biasing, U-, V( are in fact unbiased, i.e. uniform. Let & n denote 
the cr-algebra generated by {U u , V u : \u\ < n}. Then the family {U[, V( : 1 < i < 2 n } is independent 

Of^n- 

A martingale. Let s G [0, 1] be fixed. We show that the sequence (Z n (s),n > 0) is a non-negative 
discrete time martingale ; so it converges with probability one to a finite limit Z(s). To prove that Z n (s) 
is a indeed a martingale, it suffices to prove that, for 1 < i < 2 n , 



E 



G(Ul,V?,h,h,h,h) 



pn 



^n 



= h 



pn 



Since J7/, V-, 1 < i < 2 n are independent of J£" n , this clearly reduces to the following lemma. 

Lemma 8. For the operator G defined in (14) and U, V two independent [0, l]-uniform random vari- 
ables, and any s E [0, 1], we have E [G(U, V, h, h, h, h)(s)] = h(s). 



Almost sure continuity. Assume for the moment that there exist constants a, b e (0, 1) and C 
such that 



sup \Z n+1 (s) 

v*€[0,l] 



Z n (s)\ >a n ) <C-b n . 



(18) 



Then, by the Borel-Cantelli lemma, the sequence (Z n ) is almost surely cauchy with respect to the supre- 
mum norm. Completeness of (C [0, 1] , || • || ) yields the existence of a random process Z with continuous 
paths such that Z n — > Z uniformly on [0,1]. We now move on to showing that there exist constants a 
and b such that (18) is satisfied. We start by a bound for a fixed value s E [0,1]. 

Lemma 9. For every s E [0, 1], any a E (0, 1), and any integer n large enough, we have the bound 

P (|Zn+i(a) - Z n (s)\ > a n ) < 4(16elog(l/a)) n . 

Then, in order to handle the supremum over s E [0, 1], in (18) note that the number of values taken 
by Z n is at most the number of boxes at level n, i.e. 4 n . To avoid unnecessary technicalities, we use 
fixed points (much more than 4 n ) to control the extent of sup sG [ ,i] I Z n +i (s) — Z n (s)\. Consider the set 
V n of x-coordinates of the vertical boundaries of all the rectangles at level n. Let L n = inf {\x — y\ : 
x, y E V n }. Then, on the event that L n > 7™, we have 

sup \Z n+1 (s) - Z n (s)\ < sup \Z n+1 (i 7 n ) - Z n (i 7 n )\. 

sG[0,l] l<^<L7" n J 



In particular, it follows by the union bound that, for any 7 E (0, 1), 

P( sup \Z n+1 (s)-Z n (s)\>a n ) <7~ n sup P (\Z n+1 (s) - Z n (s)\ > a n ) + P (L n < 7 n ) ■ 
\s€[o,i] / se[o,i] 

The following lemma then yields (18) which completes the proof. 

Lemma 10. For any positive real number 7 small enough, it exists an integer n\ (7) with 

P (L n < 7 n ) < 6 • 4 n 7 n / 201 , n > m( 7 ). 

3.3 Uniform convergence of the mean 

The proof of Theorem 1 requires to show uniform convergence of the first moment n _/5 E [C n (s)] to- 
wards /ii(s) = Ki(s(l — «s))^/ 2 uniformly on [0, 1] in order to verify assumption (Al), in particular 
the rate R{n) in (11). Note that, since C n (s) is continuous in any fixed s almost surely, the function 
s —¥ E [C n (s)] is continuous for any n. Curien and Joseph [4] only show pointwise convergence, and 
proving uniform convergence requires a good deal of additional arguments. 

The first step is to prove a Poissonized version, the fixed-n version is then obtained by a routine 
Tauberian argument. Consider a Poisson point process with unit intensity on [0, l] 2 x [0, 00). The first 
two coordinates represent the location inside the unit square; the third one represents the time of arrival 
of the point. Let Pt(s) denote the partial match cost for a query at x = s in the quad tree built from the 
points arrived by time t. 

Proposition 11. There exists e > such that 

sup \t-PE[P t (s)}-^(s)\ = 0(t- £ ). 
se[o,i] 

The proof of Proposition 11 relies crucially on two main ingredients: first, a strengthening of the 
arguments developed by Curien and Joseph [4], and the speed of convergence E[C n (£)] to E[//i(£)] for 
a uniform query line £, see (2), by Chern and Hwang [3]. By symmetry, we write for any 6 E (0, 1/2) 

sup |t- /5 E[P t ( 5 )]- W ( 5 )|<sup|t- /3 E[P,( 5 )]- M i( 5 )|+ sup \t^E[P t (s)]-^(s)\. (19) 
se[o,i] s<s se(S,i/2] 

The two terms in the right-hand side above are controlled by the following two lemmas. 

Lemma 12 (Behavior on the edge). There exists a constant C\ such that 

limsupsup \t-^E[P t (s)} - //i(s)| < C^' 2 . (20) 

t— ^00 s<5 

Lemma 13 (Behavior away from the edge). There exist constants 62,63,77 with < 77 < f3 and 
7 E (0, 1) such that, for any integer k, and real number 8 G (0, 1/2) we have, for any t > 0, 

sup \t-f>E[P t (s)] - mi (*) I < C 2 5-\l - 7 ) k + C 3 k2 k ((3 - r?)" 2 ^. 

s>5 

Behaviour along the edge. The behaviour away from the edge is rather involved and we do 
not describe how the bound in Lemma 13 is obtained. To deal with the term for involving the values 
of s e [0,5], we relate the value E[P t (s)} to E[P t (8)}. Note that the limit first moment /ii(s) = 
lim n ^oo E[Pt(s)] is monotonic for s E [0, 1/2]. It seems, at least intuitively, that for any fixed real 
number t > 0, E[Pt(s)] should also be monotonic for s E [0, 1/2], but we were unable to prove it. The 
following weaker version is sufficient for our purpose. 

Proposition 14 (Almost monotonicity). For any s < 1/2 and e E [0, 1 — 2s), we have 

' s + e s 



E[P t (s)\ < E 



P t(l+e) 



l+£ 



4 Second moment and supremum 

In this section, we obtain explicit expressions about the limit, proving that our general approach also 
turns out to yield effective and computable results. 

Variance of the cost. We first focus on the result in Theorem 3. Our main result implies the 
convergence n- 2 ^E[C n (s) 2 } -► E[Z(s) 2 }. Write h(s) = E[Z(s)} = (s(l - s)f 2 . Taking second 
moments in (9) and writing it as an integral in terms of ^{s) = E[Z(s) 2 ] yields that we have the 
following integral equation, for every s G [0, 1], 

'°\j- , ft, -,28.. fl-s\ J J\ , ^ fQl , a , 1X h( S f 



M2(s) = ^py If x 2 ^2 (^) dx + f (1 - xf^ 2 (rz^) dx\ + 2B(/3 + 1, /3 + 1 



+ 1 



One easily verifies that the function / given by f(s) — C2h(s) 2 solves the above equation provided that 
the constant c 2 satisfies 

c 2 = * iX c a + 2 B ^ + fl 1, ? + 1) thatis c 2 = 2B( /3 + l^ + l) ; 2 ^ + 1 



(2/3 + 1)08 + 1) z /3 + 1 z v ^ '^ ; 3(l-/3)' 

since /3 2 = 2 — 3/3. So if we were sure that ^(s) is indeed C2/i(s) 2 , we would have by integration 

Var(Z(0) = c 2 B(/3 + 1, (3 + 1) - B(/3/2 + l,/3/2 + l) 2 . 

To complete the proof, it suffices to show that the integral equation satisfied by ji^ actually admits a 
unique solution. To this aim, we show that the map K defined below is a contraction for the supremum 
norm (the details are omitted) 

Cost of the worst query. The uniform convergence of n~ (3 C n {-) to the process Z(-) directly 
implies (continuous mapping theorem) the first claim of Theorem 5, 

K i™* se[o,i] 

The convergence in the Zolotarev metric (2 on which the contraction method is based here, is strong 
enough to imply convergence of the first two moments of S n to the corresponding moments of S. 

5 Concluding remarks 

The method we exposed here to obtain refined results about the costs of partial match queries in quadtrees 
also applies to other geometric data structures based on the divide-and-conquer approach. In particular, 
similar results can be obtained for the k-d trees of Bentley [1] or the relaxed k-d trees of Duch et al. [7]. 
We conclude by mentioning some open questions. The supremum of the process is of great in- 
terest since it upperbounds the cost of any query. Can one identify the moments of the supremum 
su Pse[0,i] Z(s) (first and second)? In the course of our proof, we had to construct a continuous solution 
of the fixed point equation. We prove convergence in distribution, but conjecture that the convergence 
actually holds almost surely. 
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